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ABSTRACT 

In this paper, we revisit the D-iteration algorithm in or- 
der to better explain different performance results that were 
observed for the numerical computation of the eigenvector 
associated to the PageRank score. We revisit here the prac- 
tical computation cost based on the execution runtime com- 
pared to the theoretical number of iterations. 

Categories and Subject Descriptors 

G.1.3 [Mathematics of Computing]: Numerical Anal- 
ysis — Numerical Linear Algebra; G.2.2 [Discrete Mathe- 
matics]: Graph Theory — Graph algorithms 

General Terms 

Algorithms, Performance 

Keywords 

Numerical computation; Iteration; Fixed point; Gauss-Seidel; 
Eigenvector. 

1. INTRODUCTION 

In this paper, we assume that the readers are already fa- 
miliar with the idea of the fluid diffusion associated to the 
D-iteration [B] to solve the equation: 

X = P.X + B 

and its application to PageRank equation [7J. 

For the general description of alternative or existing iter- 
ation methods, one may refer to f5J O ■ 

This paper investigates further the analysis done in [S]. 
The main results is that using the container vector for the it- 
erators, we obtain better and more predictable performance. 

2. ANALYSIS OF THE COMPUTATION COST 
2.1 C++ Programming environment 

For the evaluation of the computation cost, we used Linux 
(Ubuntu) machines: 

Intel (R) Core(TM)2 CPU, U7600, 1 . 20GHz, cache size 2048 
KB (Linuxl, g + + - 4.4) 

and 

Intel (R) Core(TM) i5 CPU, M560, 2.67GHz, cache size 
3072 KB (Linux2, g + + - 4.6). 

The runtime has been measured based on the library time.h 
with the function clock() (with a precision of 10 ms). The 



runtime below measures the computation time from the time 
we start iterations (time to build the iterators are counted 
separately as Initialization time). Note that the initializa- 
tion time has not been optimized here. Using binary input 
format, we observed a gain factor of more than 5 compared 
to the loading time that are shown in this paper. 

The iterators have been built based on the class vector 
from STD library: 

vector<int> l_out [N] ; 
vector<int> l_in[N] ; 

Compared to results presented in [5] where the class list 
was used, we realized that the use of vector class was in fact 
appropriate for the iteration schemes we use in the context 
of PageRank equations. The reason is indeed obvious for 
programmers: vector is meant to be used for variable size 
vector optimizing the access time to the iterator's value (no 
pointer required as for list). The results in [8] were mainly 
biased by the fact that the list has been built naturally col- 
umn by column and not per row, because of the input file 
structure: 

#origin_node destination_node 

993508 

1 999978 

2 999978 

3 999978 

5 4 

6 4 

6 962147 

2.2 Java Programming environment 

For the evaluation of the computation cost in Java, we 
used only the Linux2 (Ubuntu) machine and JDK version: 

java version "1.6.0_23" 

OpenJDK Runtime Environment (IcedTea6 l.llpre) 
(6b23~prell-0ubuntul . 11 . 10 . 2) 
OpenJDK 64-Bit Server VM (build 20.0-bll, mixed mode) 

The runtime has been measured by calling the method: 
getCurrentThreadCpuTime () (for longer computations, we 
checked that the value was very close to the one measured by 
System. currentTimeMillis () which was enough precision 
compared to the C++ measurements). We used the same 
rules to start and stop measuring as was done in the C++ 
implementation. 

For performance reasons we used arrays of primitive int 
TIntArrayList from [5] rather than a classic collection of 
Integer objects (ArrayList<ArrayList<Integer»): 



ArrayList<TInt ArrayList> l_out ; 
ArrayList<TIntArrayList> l_in; 

As opposed to the C++ implementation, the graph was 
read directly from a WebGraph compressed file (see [4] and 
[3]), we actually used the Java code to generate the text file 
parsed by the C++ implementation. 

2.3 Algorithms for evaluation 

The algorithms that we evaluated are: 

• PI: Power iteration (equivalent to Jacobi iteration), 
using row vectors; 

• PI': Power iteration (equivalent to Jacobi iteration), 
using column vectors; 

• GS: Gauss-Seidel iteration (cyclic sequence); 

• DI-CYC: D-iteration with cyclic sequence (a node i is 
selected, if (F n ) l > 0); 

• DI-SOP (sub-optimal compromise solution): D-iteration 
with node selection, if (F n )i > r n i x #outi/L, where 
#outi is the out-degree of i and r n i is computed per 
cycle n' . 

2.4 Notations 

• L: number of non-null entries (links) of P; 

• D: number of dangling nodes (0 out-degree nodes); 

• E: number of in-degree nodes: the in-degree nodes 
are defined recursively: a node i, having incoming links 
from nodes that are all in-degree nodes, is also a 
in-degree node; from the diffusion point of view, those 
nodes are those who converged exactly in finite steps; 

• O: number of loop nodes [pa 7^ 0); 

• maxi„ = max; H^irii (maximum in-degree, the in-degree 
of i is the number of non-null entries of the i-th row 
vector of P); 

• max„„i = max; H^outi (maximum out-degree, the out- 
degree of i is the number of non-null entries of the i-th 
column vector of P). 

2.5 Pseudo-codes 

The target error we considered here is 1/iV. 

out [i] := out-degree of node i; 
in[i] := in-degree of node i; 
l_out [i] := iterator for column i; 
l_in[i] := iterator for row i. 

2.5.1 Power iteration per row: PI 

for (int i = 0; i < N; i++H 

x_old[i] = 1.0/N; 

} 

Loop: 

while ( error > target_error ){ 

for (int i = 0; i < N; i++H 

x_new[i] = (l-d)/N; 

} 

for (int i = 0; i < N; i++H 



for (vector<int> :: iterator j=l_in[i] .beginQ ; 

j!=l_in[i].end(); j++){ 
x_new[i] += d * x_old [* j] /out [* j] ; 

} 

} 

error = 0.0; 

for (int i = 0; i < N; i++H 
error += x_new[i] ; 

} 

for (int i = 0; i < N; i++H 
x_new[i] += (1 . 0-error) /N; 

} 

error = 0.0; 

for (int i = 0; i < N; i++H 

error += abs (x_new [i] -x_old [i] ) ; 

} 

error *= d/ (1-d) ; 

} 

2.5.2 Power iteration per column: PI' 

for (int i = 0; i < N; i++){ 
x_old[i] = 1.0/N; 

} 

Loop : 

while ( error > target_error*(l-d)/d ){ 
for (int i = 0; i < N; i++H 
x_new[i] = (l-d)/N; 

} 

for (int i = 0; i < N; i++H 
transit = d * x_old[i] /out [i] ; 
for (vector<int> :: iterator j=l_out [i] .beginO ; 

j !=l_out [i] .end() ; j++H 
x_new[*j] += transit; 

} 

} 

error = 0.0; 

for (int i = 0; i < N; i++H 
error += x_new[i] ; 

} 

for (int i = 0; i < N; i++H 
x_new[i] += (1 . 0-error) /N; 

} 

error = 0.0; 

for (int i = 0; i < N; i++H 

error += abs (x_new[i] -x_old [i] ) ; 

} 

error *= d/ (1-d) ; 

} 

2.5.3 Gauss-Seidel: GS 

for (int i = 0; i < N; i++){ 
x[i] = (l-d)/N; 

} 

while ( error > target_error ){ 
error = 0.0; 

for (int i = 0; i < N; i++){ 
previous = x [i] ; 
x[i] = (l-d)/N; 
diag = 1.0; 

for (vector<int> :: iterator j=l_in[i] .beginO ; 

j!=l_in[i].end(); j++){ 

if ( *j != i ){ 



x[i] += d * x[*j]/out[*j] ; 
}■ else { 

diag -= d/out [i] ; 

} 

} 

x[i] /= diag; 

error += x[i] - previous; 

} 

e = 0.0; 

for (int i = 0; i < N; i++){ 
if ( out[i] == ) 
e += x [i] ; 

} 

} 

error = error*d/(l - d - d*e) ; 

} 

2.5.4 D-iteration by cycle: DI-CYC 

for (int i = 0; i < N; i++){ 
hist[i] =0.0; 
fluid [i] = (l-d)/N; 

} 

e = 0.0; 

while ( error > target_error ){ 
for (int i=0; i<N; i++){ 
if ( fluid [i] > ){ 
if ( loop[i] == 1 M 

transit = f luid [i] *out [i] /(out [i] -d) ; 
} else { 

transit = fluid [i] ; 

} 

hist [i] += transit; 
fluid [i] =0.0; 
if ( outgoing [i] == ) 
e += transit; 

double sent = transit*d/out [i] ; 

for (vector<int> :: iterator j=l_out [i] . begin () ; 

j !=l_out[i] .end() ; j++){ 

if ( *j != i ){ 

fluid [*j] += sent; 

} 

} 

} 

} 

} 

error = 0.0; 

for (int i=0; i < N; i++M 
error += fluid [i] ; 

} 

error /= (1 - d - d*e) ; 

} 

2. 5. 5 D-iteration based on the average diffusion cost: 
DI-SOP 

Same as for DI-CYC, replacing the condition: 
if ( fluid [i] > ) 

by 

r = 0.0; 

for (int i=0; i < N; i++){ 
r += fluid [i] ; 

} 

if ( fluid [i] > r/L*out[i] ) 



N 


L/N 


D/N 


E/N 


O/N 




max™; 


10 3 


12.9 


0.041 


0.032 


0.236 


716 


130 


10 4 


12.5 


0.008 


0.145 


0.114 


7982 


751 


10 5 


31.4 


0.027 


0.016 


0.175 


34764 


3782 


10 6 


41.2 


0.046 





0.204 


403441 


4655 



Table 1: Extracted graph: N = 10 3 to 10 6 . 



2.6 Dataset 1 

In this section, we use the web graph imported from the 
dataset uk-2007-05@ 1000000 (available on pQ) which has 
41,247,159 links on 10 6 nodes. 

Below we vary N from 10 3 to 10 6 extracting from the 
dataset the information on the first TV nodes. Few graph 
properties are summarized in Table [T] 

In Table [2] and [3] we present the results obtained with 
Linux 1 and Linux2: 

• the prediction by the number of iterations is quite good 
for GS; 

• the prediction by the number of iterations is quite good 
for DI-CYC and DI-SOP when the compiler optimiza- 
tion is not used; 

• PI' is much better than GS with compiler optimiza- 
tion; 

• PI' and GS are close without compiler optimization; 

• the compiler optimization can bring a speed-up factor 
(time2/timel) 4-15; the gain factor (9-15 for Linuxl, 6- 
17 for Linux2) for column-vector based methods (PI', 
DI-CYC, DI-SOP) is more important than the gain (4 
for Linuxl, 5 for Linux2) for row- vector based methods 
(PI, GS). 

Using Java, we obtained similar results (Table [3J. 

2.7 Dataset Ibis 

We considered here the same dataset than dataset 1 but 
for P t (transposed matrix, which means we inverse incoming 
and outgoing links). In Table [5] and [6] we present the results 
obtained for P with Linuxl and Linux2: 

• the prediction by the number of iterations is not bad 
for GS; 

• the prediction by the number of iterations is still good 
for DI-CYC and DI-SOP when the compiler optimiza- 
tion is not used; 

• PI' is still much better than GS with compiler opti- 
mization; 

• PI' and GS are still close without compiler optimiza- 
tion; 

• the compiler optimization can bring a speed-up factor 
(time2/timel) 4-16; the gain factor for column-vector 
based methods (PI': 11-16, DI-CYC and DI-SOP: 5-9) 
is more important than the gain for row-vector based 
methods (PI and GS: 4-5). 





PI 


PF 


GS 


DI-CYC 


DI-SOP 


N — 10 a . Init 


: 0.05s 










nK ifpr 


28 


28 


18.7 


17.5 


11.1 


speed-up 


1 


1.0 


1.5 


1.6 


2.5 


timet (s) 


0.02 


0.01 


0.02 


0.01 


0.00 


time2 (s) 


0.05 


0.03 


0.05 


0.03 


0.02 


N — 10 4 Tnit 


: 0.2s 










rib iter 


43 


43 


30.7 


26.4 


12.0 
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1.0 


1.4 
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3.6 


timet (s) 


0.15 


0.04 


0.12 


0.06 


0.02 


speed-up 
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3.8 


1.3 


2.5 


7.5 


time2 (s) 


0.64 


0.43 


0.52 


0.35 


0.16 


speed-up 


1 


1.5 


1.2 


1.8 
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time2/timel 


4 


11 


4 


6 
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N = 10 b . Init 


: 3s 










nb iter 


52 


52 


36.8 


34.7 


14.3 


speed-up 
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1.0 
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3.6 
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4.5 
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4.2 
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1.8 
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15 
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N — 10 b . Init 
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66 


66 


41.8 


39.8 


14.6 


speed-up 
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1.6 


1.7 


4.5 


timet (s) 


75 


13 


51 


16 


6.3 


speed-up 
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5.8 


1.5 


4.7 


11.9 


time2 (s) 


296 


199 


207 


144 


54 


speed-up 


1 


1.5 


1.4 


2.1 


5.5 


time2/timel 


4 


15 


4 


9 


9 



Table 2: Linuxl: Comparison of the runtime for a 
target error of 1/N. Speed-up: gain factor w.r.t. 
PI. timel: with compiler optimization. time2: no 
compiler optimization. 
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nb iter 


52 


52 


36.8 


34.7 


14.3 


speed-up 


1 


1.0 


1.4 


1.5 


3.6 


timel (s) 


1.3 


0.32 


0.98 


0.46 


0.23 


speed-up 
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4.1 


1.3 


2.8 


5.7 


time2 (s) 
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4.7 


4.5 


3.5 


1.6 


speed-up 
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1.3 


1.4 


1.7 


3.8 


time2/timel 


5 


15 


5 


8 
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N — 10 b . Init 
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nb iter 


66 


66 


41.8 


39.8 


14.6 


speed-up 
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1.0 


1.6 


1.7 


4.5 


timel (s) 


21 


5.6 


14 


6.4 


3.1 


speed-up 


1 


3.8 


1.5 


3.3 


6.8 


time2 (s) 


97 


78 


67 


53 


21 


speed-up 


1 


1.2 


1.4 


1.8 


4.6 


time2/timel 


5 


14 


5 


8 


7 



Table 3: Linux2: Comparison of the runtime for a 
target error of 1/N. Speed-up: gain factor w.r.t. 
PI. timel: with compiler optimization. time2: no 
compiler optimization. 
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TV = 10 a . 


Init: 1.9s 








nb iter 
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18.7 


17.5 


11.1 


speed-up 
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time (s) 


0.00 0.00 
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Init: 2.0s 








nb iter 


43 43 
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26.4 


11.8 


speed-up 


1 1.0 
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time (s) 
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speed-up 
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nb iter 
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36.8 
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14.4 


speed-up 


1 1.0 
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3.6 


time (s) 


1.26 0.45 


0.91 


0.43 


0.33 


speed-up 
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TV = 10 b . 


Init: 2.8s 








nb iter 


66 66 


41.8 


39.8 


14.6 


speed-up 


1 1.0 


1.6 


1.7 


4.5 


time (s) 


21.3 7.85 


13.33 


6.2 


4.0 


speed-up 


1 2.7 


1.6 


3.4 


5.3 



Table 4: Linux2 in Java: Comparison of the runtime 
for a target error of 1/TV. Speed-up: gain factor 
w.r.t. PI. Similar to table [3] but in Java! 



• the gain factors are globally more stable than we ex- 
pected compared to results for P (we expected worse 
results): this suggests that the performance of the D- 
iteration approaches are quite stable w.r.t. the vari- 
ance of in-degree/out-degree. 

The results with Java are shown in Table [7] In the Java 
implementation the init phase was almost constant (from 
1.9s with TV = 10 3 to 2.9s with TV = 10 6 ) when varying TV, 
because we always read the full graph file (10 6 ). 

2.8 Dataset 2 

Below, we used the web graph grO . Calif ornia (available 
on http : //www . cs . Cornell . edu/Courses/ cs685/ 2002f a/). 
The main motivation was here to try to understand the un- 
expected (too much) gain observed in [7] for this graph. 

As it has been pointed out in [8] (Table [8}, this graph is 
very specific in that more than 90% of nodes are in-degree 
nodes. The runtime is here too short to make a comparison. 

Table \W\ presents the results of the computation cost as- 
sociated to the matrix P t for comparison. 

3. CONCLUSION 

In this paper we revisited the D-iteration method with 
a practical consideration of the computation cost to solve 
the PageRank equation for web graphs: the use of the class 
vector for iterators produced much faster results with per- 
formance that are closer to expectations. 
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Table 5: P f on Linuxl: Comparison of the runtime 
for a target error of 1/TV. Speed-up: gain factor 
w.r.t. PI. timel: with compiler optimization. time2: 
no compiler optimization. 
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Table 6: P on Linux2: Comparison of the runtime 
for a target error of 1/N. Speed-up: gain factor 
w.r.t. PI. timel: with compiler optimization. time2: 
no compiler optimization. 





PI PF 


GS 


DI-CYC 


DI-SOP 


jV = 10 a . 


Init: 1.9s 








nb iter 


30 30 


22.6 


21.1 


12.2 


speed-up 


1 1.0 


1.3 


1.4 


2.5 


time (s) 


0.01 0.00 


0.01 


0.00 


0.00 


N = 10 4 . 


Init: 2.0s 








TlVt ifpT 


40 40 


30.7 


28.3 


11.4 


speed-up 


1 1.0 


1.3 


1.4 


3.5 


flTTlP fdi 

lj 1±1±C 1 O 1 


05 02 


0.04 


0.03 


0.02 


speed-up 


1 2.5 


1.25 


1.7 


2.5 


N = 10 5 . 


Init: 2.1s 








nb iter 


51 51 


37.8 


35.2 


17.4 


speed-up 


1 1 


1.3 


1.4 


2.9 


T1TTIP fdl 

lj 1±1±C 1 O 1 


1 23 50 


0.92 


0.46 


0.38 


speed-up 


1 2.5 


1.3 


2.7 


3.2 


N = 10°. 


Init: 2.9s 








nb iter 


78 78 


45.8 


43.1 


17.2 


speed-up 


1 1.0 


1.7 


1.8 


4.5 


time (s) 


25.6 9.7 


15.3 


7.4 


5.1 


speed-up 


1 2.6 


1.7 


3.5 


5.0 



Table 7: P on Linux2 in Java: Comparison of the 
runtime for a target error of 1/N. Speed-up: gain 
factor w.r.t. PI. Similar to table [6] but in Java! 



L/N 


D/N 


E/N 


O/N maxi„ max ml 


1.67 


0.48 


0.91 


199 164 



Table 8: N = 9664. 





PI 


PF 


GS 


DI-CYC 


DI-SOP 


N = 9664. Init: 0.1s 


nb iter 


43 


43 


22 


3.1 


1.8 


speed-up 


1 


1.0 


2.0 


14 


24 


timel 


0.03 


0.02 


0.04 


0.01 


0.01 


speed-up 


1 


1.5 


0.8 


3.0 


3.0 


time2 


0.16 


0.12 


0.09 


0.04 


0.03 


speed-up 


1 


1.3 


1.8 


4.0 


5.3 


time2/timel 


5 


6 


2 


4 


3 



Table 9: P. timesl: with compiler optimization, 
time2: without compiler optimization. 





PI 


PF 


GS 


DI-CYC 


DI-SOP 


N = 9664. Init: 0.1s 


nb iter 


28 


28 


16 


5.6 


2.0 


speed-up 


1 


1.0 


1.8 


5.0 


14 


timel 


0.04 


0.01 


0.03 


0.01 


0.01 


speed-up 


1 


4 


1.3 


4 


4 


time2 


0.11 


0.07 


0.08 


0.04 


0.03 


speed-up 


1 


1.6 


1.4 


2.8 


3.7 


time2/time2 


3 


7 


3 


4 


3 



Table 10: P l . timesl: with compiler optimization, 
time2: without compiler optimization. 
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